Background: Coalescent simulation is pivotal for understanding population evolutionary models and demographic\r\nhistories, as well as for developing novel analytical methods for genetic association studies for DNA sequence data.\r\nA plethora of coalescent simulators are developed, but selecting the most appropriate program remains\r\nchallenging.\r\nResults: We extensively compared performances of five widely used coalescent simulators ââ?¬â?? Hudsonââ?¬â?¢s ms, msHOT,\r\nMaCS, Simcoal2, and fastsimcoal, to provide a practical guide considering three crucial factors, 1) speed, 2)\r\nscalability and 3) recombination hotspot position and intensity accuracy. Although ms represents a popular\r\nstandard coalescent simulator, it lacks the ability to simulate sequences with recombination hotspots. An extended\r\nprogram msHOT has compensated for the deficiency of ms by incorporating recombination hotspots and gene\r\nconversion events at arbitrarily chosen locations and intensities, but remains limited in simulating long stretches of\r\nDNA sequences. Simcoal2, based on a discrete generation-by-generation approach, could simulate more complex\r\ndemographic scenarios, but runs comparatively slow. MaCS and fastsimcoal, both built on fast, modified sequential\r\nMarkov coalescent algorithms to approximate standard coalescent, are much more efficient whilst keeping salient\r\nfeatures of msHOT and Simcoal2, respectively. Our simulations demonstrate that they are more advantageous over\r\nother programs for a spectrum of evolutionary models. To validate recombination hotspots, LDhat 2.2 rhomap\r\npackage, sequenceLDhot and Haploview were compared for hotspot detection, and sequenceLDhot exhibited the\r\nbest performance based on both real and simulated data.\r\nConclusions: While ms remains an excellent choice for general coalescent simulations of DNA sequences, MaCS\r\nand fastsimcoal are much more scalable and flexible in simulating a variety of demographic events under different\r\nrecombination hotspot models. Furthermore, sequenceLDhot appears to give the most optimal performance in\r\ndetecting and validating cross-over hotspots.
Loading....